Cell biology: Networks, regulation, pathways 
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This review was written for the Encyclopedia of Complexity and System Science (Springer- Verlag, 
Berlin, 2008), and is intended as a guide to the growing literature which approaches the phenomena 
of cell biology from a more theoretical point of view. We begin with the building blocks of cellular 
networks, and proceed toward the different classes of models being explored, finally discussing the 
'design principles' which have been suggested for these systems. Although largely a dispassionate 
review, we do draw attention to areas where there seems to be general consensus on ideas that 
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Biological network has come to mean a system of inter- 
acting molecules that jointly perform cellular tasks such 
as the regulation of gen e expression , information trans- 
mission, or metabolism \Braik , IT995). Specific instances 
of biological networks include, for example, the DNA and 
DNA binding proteins comprising the transcriptional reg- 
ulatory network; signaling proteins and small molecules 
comprising various signaling networks; or enzymes and 
metabolites comprising the metabolic network. Two im- 
portant assumptions shape our current understanding of 
such systems: first, that the biological networks have 
been under selective evolutionary pressure to perform 



specific cellular functions in a way that furthers the over- 
all reproductive success of the individual; and second, 
that these functions often are not implemented on a mi- 
croscopic level by single molecules, but are rather a col- 
lective property of the whole interaction network. The 
question of how complex behavior emerges in a network 
of (sim ple) node s unde r a functional constraint is thus 
central 



To start off with a concrete example, con sider c hemo- 
taxis in the bacterium Escherichia coli \Berd . 119751: 
\Falke et al. I. I1997D . one of the paradigmatic examples of 
signal transduction. This system is dedicated to steering 
the bacteria towards areas high in nutrient substances 
and away from repellants. Chemoeffector molecules 
in the solution outside the bacterium bind to receptor 
molecules on the cell surface, and the resulting struc- 
tural changes in the receptors are relayed in turn by the 
activities of the intracellular signaling proteins to gener- 
ate a control signal for molecular motors that drive the 
bacterial flagella. The chemotactic network consists of 
about 10 nodes (here, signaling proteins), and the in- 
teractions between the nodes are the chemical reactions 
of methylation or phosphorylation. Notable features of 
this system include its extreme sensitivity, down to the 
limits set by counting individual molecules as they ar- 
rive at the cell surface ( Bera and PurcelL Il977| ). and the 
maintenance of this sensitivity across a huge dynamic 
range, through an adaptation mechanism that provides 
nearly perfect compensa tion of background concentra- 
tions (Bl ock et all , 1 19831 ). More recently it has been 
appreciated that aspects of this functionality, such as 
perfect adaptation, are also robust against large varia- 
tions in the concentration s of the network components 
{ Barkai and LeibleA . \l99l\) . 



Abstractly, different kinds of signaling proteins, such 
those in chemotaxis, can be thought of as the building 
blocks of a network, with their biochemical interactions 
forming the wiring diagram of the system, much like the 
components and wiring diagram of, for instance, a radio 
receiver. In principle, these wiring diagrams are hugely 
complex; for a network composed of N species, there are 
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~ possible connections among any set of k compo- 
nents, and typically we don't have direct experimental 
guidance about the numbers associated with each 'wire.' 
One approach is to view this a giant fitting problem: 
once we draw a network, there is a direct translation of 
this graph into dynamical equations, with many parame- 
ters, and we should test the predictions of these dynam- 
ics against whatever data are available to best determine 
the underlying parameters. Another approach is to ask 
whether this large collection of parameters is special in 
any way other than that it happens to fit the data-are 
there principles that allow us to predict how these sys- 
tems should work? In the context of chemotaxis, we 
might imagine that network parameters have been se- 
lected to optimize the average progress of bacteria up 
the chemical gradients of nutrients, or to maximize the 
robustness of certain functions against extreme param- 
eter variations. These ideas of design principles clearly 
are not limited to bacterial chemotaxis. 

An important aspect of biological networks is that the 
same components (or components that have an easily 
identifiable evolutionary relationship) can be (re)used in 
different modules or used for the same function in a dif- 
ferent way acros s species, as discussed for example by 
\Rao et al. ( 2004 ) for the case of bacterial chemotaxis. 
Furthermore, because evolutionary selection depends on 
function and not directly on microscopic details, different 
wiring diagrams or even changes in components them- 
selves can result in the same performance; evolutionary 
process can gradually change the structure of the net- 
work as long as its function is preserved; as an example 
se e the discussion of transcriptional regulation in yeast 
bv \Tanav efal\ (|2005l) . On the other hand, one can also 
expect that signal processing problems like gain control, 
noise reduction, ensuring (bi)stability etc, have appeared 
and were solved repeatedly, perhaps even in similar ways 
across various cellular functions, and we might be able 
to detect the traces of their commonality in the net- 
work structure, as for example in the discussion of lo- 
cal connectivity in bac terial transcriptional regulation by 
\Sh en-Orr et al. (2002). Thus there are reasons to believe 
that in addition to design principles at the network level, 
there might also be local organizing principles, similar 
to common wiring motifs in electronic circuitry, yet still 
independent of the identity of the molecules that imple- 
ment these principles. 

Biological networks have been approached at many dif- 
ferent levels, often by investigators from different disci- 
plines. The basic wiring diagram of a network — the fact 
that a kinase phosphorylates these particular proteins, 
and not all others, or that a transcription factor binds to 
the promoter regions of particular genes is determined 
by classical biochemical and structural concepts such as 
binding specificity. At the opposite extreme, trying to 
understand the collective behavior of the network as a 
whole suggests approaches from statistical physics, often 
looking at simplified models that leave out many molecu- 
lar details. Analyses that start with design principles are 



yet a different approach, more in the 'top-down' spirit 
of statistical physics but leaving perhaps more room for 
details to emerge as the analysis is refined. Eventually, 
all of these different views need to converge: networks 
really are built out of molecules, their functions emerge 
as collective behaviors, and these functions must really 
be functions of use to the organism. At the moment, 
however, we seldom know enough to bridge the differ- 
ent levels of description, so the different approaches are 
pursued more or less independently, and we follow this 
convention here. We will start with the molecular build- 
ing blocks, then look at models for networks as a whole, 
and finally consider design principles. We hope that this 
sequence doesn't leave the impression that we actually 
know how to build up from molecules to function! 

Before exploring our subject in more detail, we take 
a moment to consider its boundaries. Our assignment 
from the editors was to focus on phenomena at the level of 
molecular and cellular biology. A very different approach 
attempts to create a 'science of networks' that searches 
for common properti es in biological, social , economic and 
computer networks ^Newman et 0^1 . 120061 ). Even within 
the biological world, there is a significant divide between 
work on networks in cell biology and networks in the 
brain. As far as we can see this division is an artifact of 
history, since there are many issues which cut across these 
different fields. Thus, some of the most beautiful work 
on signaling comes from photoreceptors, where the com- 
bination of optical inputs and electrical outputs allowed, 
already in the 1970s, for experiments with a degree of 
quantitative analysis that even today is hard to match in 
systems which take chemical inputs and giv e outputs that 
modulate the expression level s of genes \Bavlor et ali . 
119791 : \Rieke and BavloA . Il998t h Similarly, problems of 
noise in the control of gene expression have parallels in 
the long history of work on noise in ion chann els, as we 
have discussed elsewhere ( Tk acik et al. I. I2007d\ and the 
problems of robustness have also been extensively ex- 
plored in the network of interactions am ong the multiple 
species of ion channels in the membrane ^Goldman et ali . 
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20011 : \LeMasson et al. I. I1993D . Finally, the ideas of collec- 
tive behavior are much better developed in the context 
of neural networks than in cellular networks, and it is an 
open question how much can be learned by study i ng th ese 



different systems in the same language (| Tkaci 



udymg th es 
^ . l2007h . 



II. BIOLOGICAL NETWORKS AND THEIR BUILDING 
BLOCKS 

A. Genetic regulatory networks 

Cells constantly adjust their levels of gene expression. 
One central mechanism in this regulatory process in- 
volves the control of transcription by proteins known as 
transcription factors (TFs), which locate and bind short 
DNA sequences in the regulated genes' promoter or en- 
hancer regions. A given transcription factor can regulate 
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either a few or a sizable proportion of the genes in a 
genome, and a single gene may be regulated by more 
than one transcription factor; diff erent transcription fa c- 
tors can also regulate each other ^Watson et al. , 2003). 

In the simplest case of a gene regulated by a single 
TF, the gene might be expressed whenever the factor - 
in this case called an activator - is bound to the cog- 
nate sequence in the promoter (which corresponds to the 
situation when the TF concentration in the nucleus is 
high), whereas the binding of a repressor would shut a 
normally active gene down. The outlines of these ba- 
sic control principles were established long ago, well be- 
fore the individual transcription factors could be iso- 
lated, in elegant experiments on the lac tose operon of 
Escherichia coli { Jacob and Monod\ 11961) and even sim - 
pler model systems such as phage A ( Ptashnei , 12004) . 
To a great extent the lessons learned from these ex- 
periments have provided the framework for understand- 
i ng transcriptio nal control m ore general l y, in p rokaryotes 
( Ptas/m el . [2001), eukaryotes ( Kadonaaal , 12004) . and even 
durin g the development of comple x multicellular organ- 
isms \Arnosti and Kulkarnn , 120051 ) . 

The advent of high throughput techniques for prob- 
ing gene regulation has extended o ur reach beyond single 
genes . In particular, microarrays \Brown and Botsteinl 
Il999l) and the related d ata analysis tools, such as cluster- 
ing \Eisen et aLl . ll998l ). have enabled researchers to find 
sets of genes, or modules, that are coexpressed, i.e. up- or 
down-regulated in a correlated fashion when the organ- 
ism is exposed to different external conditions, and are 
thus probably regulated by the same set of transcription 
factors. Chromatin immunoprecipitation (ChIP) assays 
have made it possible to directly screen for short seg- 
ments of DNA that known TFs bind; using microarray 
technology it is then possible to locate the intergenic re- 
gions which these segments belong to, and hence find the 
regulated genes, as has recently been done for t he Saccha- 
romy ces cerevisiae DNA-TF interaction map ( Lee et all , 
120021) . 

These high throughput experimental approaches, com- 
bined with traditional molecular biology and comple- 
mente d by sequence analysis and related mathematical 
tools ( Siaaid , [2005), provide a large scale, topological 
view of the transcriptional regulatory network of a par- 
ticular organism, where each link between two nodes 
(genes) in the reg ulatory graph implies either activa- 
tion or repression ( Aim and ArkirA , 120031 ). While use- 
ful for describing causal interactions and trying to pre- 
dict responses to mut a tions and external perturbations 
( Levine and Davidsonl , 120051 ). this picture does not ex- 
plain how the network operates on a physical level: it 
lacks dynamics and specifies neither the strengths of the 
interactions nor how all the links converging onto a given 
node jointly exercise control over it. To address these 
issues, representative wild-type or simple synthetic reg- 
ulatory elements and networks consisting of a few nodes 
have been studied extensively to construct quantitative 
models of the network building blocks. 



For instance, combinatorial regulation of a gene by 
several transcription factors that bind a nd interact on 
the p romoter has been considered by \Buchler et all 
( 20031 ) as an example of (binary) biological computation 
and synthetic netw orks implementing such computations 



have been created iGuet et al ., 2002; I Yokobavashi et all , 
2002). Building on classical work describing allosteric 
proteins such as hemoglobin, thermodynamic models 
have been used with success to account for combi- 
natoria l interactions on t he operator of the lambda 
phage jAckers et al. , 1982). More recently \Bintu et all 
(2005jl])~ have reviewed the equilibrium statistical me- 
chanics of such interactions, \Settv et all (2003) have 
experimentally and systematically mapped out the re- 
sponse surface of the lac promoter to combinations 
of its two reg u latory inputs, cAMP and IPTG, and 
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Kuhlman et ali (|2007f) have finally provided a consistent 



picture of the known experimental results and the ther- 
modynamic model for the combinatorial regulation of the 
lactose operon. There have also been some s ucces ses in 
eukaryotic regulation, where ISchroeder et all ( 2004 ) used 
thermodynamically motivated models to detect clusters 
of binding sites that regulate the gap genes in morpho- 
genesis of the fruit fly. 

Gene regulation is a dynamical process composed of a 
number of steps, for example the binding of TF to DNA, 
recruitment of transcription machinery and the produc- 
tion of the messenger RNA, post-transcriptional regula- 
tion, splicing and transport of mRNA, translation, mat- 
uration and possible localization of proteins. While the 
extensive palette of such microscopic interactions repre- 
sents a formidable theoretical and experimental challenge 
for each detailed study, on a network level it primarily 
induces three effects. First, each node - usually under- 
stood as the amount of gene product - in a graph of 
regulatory interactions is really not a single dynamical 
variable and thus has some internal state, representing 
the configuration on the associated promoter, concentra- 
tion of the corresponding messenger RNA etc.; the rela- 
tion of these quantities to the concentration of the output 
protein is not necessarily straightforward, as emphasized 
in rec ent work comparing mRNA and protein levels in 
yeast ( Ghaemmaahami et all , 120031 ). Second, collapsing 
multiple chemical species onto a single node makes it dif- 
ficult to include non-transcriptional regulation of gene 
expression in the same framework. Third, the response 
of the target gene to changes in the concentrations of its 
regulators will be del ayed and extended in time, as in the 
example explored bv \Rosenfeld and Alon\ ( 20031) . 

Perhaps the clearest testimonies to the importance of 
dynamics in addition to network topology are provided 
by systems that involve regulatory loops, in which the 
output of a network feed s back on one of the inputs as 
an activator or repressor. \McAdams and Shavird ( 19951 ) 
have argued that the time delays in genetic regulatory 
elements are essential for the proper funct i oning of the 
phage A switch, while \Elowitz and Leible~r\ J2000) have 
created a synthetic circuit made up of three mutually re- 



4 



pressing genes (the "repressilator"), that exhibits spon- 
taneous oscillations. Circadian clocks are examples of 
natur ally occurring genetic oscillators ( Young and Kau\ . 
l200lh . 

In short, much is known about the skeleton of genetic 
regulatory interactions for model organisms, and physical 
models exist for several well studied (mostly prokaryotic) 
regulatory elements. While homology allows us to bridge 
the gap between model organisms and their relatives, it 
is less clear how and at which level of detail the knowl- 
edge about regulatory elements must be combined into a 
network to explain and predict its function. 



B. Protein-protein interaction networks 

After having been produced, proteins often assemble 
into complexes through direct contact interactions, and 
these complexes are functionally active units participat- 
ing in signal propagation and other pathways. Proteins 
also interact through less persistent encounters, as when 
a protein kinase meets its substrate. It is tempting to 
define a link in the network of protein-protein interac- 
tions by such physical associations, and this is the basis 
of several experimental methods which aim at a genome- 
wide survey of of these interactions. Although starting 
out being relatively unreliable (with false positive rates 
of up to 50%), hig h throughput techniques like the ye ast 



two hybrid assay ilto et all. l200ll: \Uetz eTaU. l2000h or 
mass spectrometry ( Gavin et all . 20021 : Ho et all . l2002h 
are providing data of increasing quality about protein- 
prote in interactions, or the "inter actome" ( Kroaan et all 
200G). While more reliabl e methods are being devel- 
oped ( Aim and ArkinV l2003h and new organisms are be- 
ing analyzed in th is way ( Giot et aZl . l2003l ; lLi et aLl . l2004t 
\Rual et aZl . l2005r i. the existing interaction data from high 
throughput experiments and curated databases has al- 
ready been extensively studied. 

Interpretation of the interactions in the protein net- 
work is tricky, however, due to the fact that different ex- 
perimental approaches have various biases - for example, 
mass spectrometry is biased towards detecting interac- 
tions between proteins of high abundance, while two hy- 
brid methods seem to be unbiased in this regard; on the 
other hand, all methods show some degree of bias towards 
different cellular localizations and evolutionary novelty 
of the proteins. Assessing such biases, however, cur- 
rently depends not on direct calibration of the methods 
themselves but on comparison of the results with man- 
ually curated database s; although the databases su rely 
have their own biases ( Jansen and GersteiA I200I . It 
is reassuring that the intersection of various experimen- 
tal results shows significantly improved agreement with 
the databases, but this comes at the cost of a substan- 
tial d rop in coverage of the proteome ( von Merina et all . 



metric: if protein A binds to protein B, B also binds to A, 
so that the network is described by an undirected graph. 
Most of the studies have been focused on binary inter- 
actions that yeast two hybrid and derived approaches 
can probe, although spectrometry can detect multipro- 
tein complexes as well. Estimates of number of links in 
these networks vary widely, even i n the yeast Saccha- 
romyces cerevisiae: \Kroaan et all |2006) directly mea- 
sure around 7100 i nterac tions (between 2700 proteins), 
while I Tucker et al. I (120011) estimate the total to be around 
13000-17000. and lww Merina et all (|2002ri would put the 
lower estimate at about 30000. Apart from the experi- 
mental biases that can influence such estimates and have 
been discussed already, it is important to realize that 
each experiment can only detect interactions between 
proteins that are expressed under the chosen external 
conditions (e.g. the nutrient medium); moreover, inter- 
actions can vary from being transient to permanent, to 
which various measurement methods respond differently. 
It will thus become increasingly important to qualify each 
interaction in a graph by specifying how it depends on 
context in which the interaction takes place. 

Proteins ultimately carry out most of the cellular pro- 
cesses such as transcriptional regulation, signal propaga- 
tion and metabolism, and these processes can be modeled 
by their respective network and dynamical system ab- 
stractions. In contrast, the interactome is not a dynam- 
ical system itself, but instead captures specific reactions 
(like protein complex assembly) and structural and/or 
functional relations that are present in all of the above 
processes. In this respect it has an important practical 
role of annotating currently unknown proteins through 
'guilt by association', by tying them into complexes and 
processes with a previously known function. 



C. Metabolic networks 

Metabolic networks organize our knowledge about an- 
abolic and catabolic reactions between the enzymes, their 
substrates and co-factors (such as ATP), by reducing the 
set of reactions to a graph representation where two sub- 
strates are joined by a link if they participate in the same 
reaction. For model organisms like the bacterium Es- 
cherichia coli the metabolic networks have been stud- 



ied in depth and a re pu blicly available \Kanehisa et all , 
I2002L \Karv et al. . 2002), and an increasing number of 
analyzed genomes offers sufficient sampling power to 
make statistical statements about the network properties 



make statistical statements abo ut tne network prop - 
across different domains of life ( Jeona et aLl .feOQO). 



In contrast to the case of transcriptional regulation, 
the relationship between two interacting proteins is sym- 



Several important features distinguish metabolic from 
protein-protein interaction and transcriptional regula- 
tion networks. First, for well studied systems the cov- 
erage of metabolic reactions is high, at least for the 
central routes of energy metabolism and small molecule 
synthesis; notice that this is a property of our knowl- 
edge, not a property of the networks (!). Second, cellular 
concentrations of metabolites usually are much higher 
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than those of transcription factors, making the stochas- 
ticity in reactions due to small molecular counts irrel- 
evant. Third, knowledge of the stoichiometry of re- 
actions allows one to directly write down a system of 
first order differential equation s for the metabolite fluxes 
( Heinrich and SchusterY fl996l ). which in steady state re- 
duces to a set of linear constraints on the space of so- 
lutions. These chemical constraints go beyond topology 
and can yield strong a nd testable predictions; for exam- 
ple, [^ar^eF^Z (2002) have shown how computationally 
maximizing the growth rate of Escherichia coli within 
the space of allowed solutions given by flux balance con- 
straints can correctly predict measurable relationships 
between oxygen and substrate uptake, and that bacte- 
ria can be evolved towards the predicted optimality for 
growth conditions in which the response was initially sub- 
optimal. 



D. Signaling networks 

Signaling networks consist of receptor and signaling 
proteins that integrate, transmit and route information 
by means of chemical transformations of the network con- 
stituents. One class of such transformations, for example, 
are post-translational modifications, where targets are 
phosphorylatcd, methylated, acetylated, ... on specific 
residues, with a resulting change in their enzymatic (and 
thus signaling) activity. Alternatively, proteins might 
form stable complexes or dissociate from them, again 
introducing states of differential activity. The ability 
of cells to modify or tag proteins (possibly on several 
residues) can increase considerably the cell's capacity to 
encode its state and transmit information, assuming that 
the signaling proteins are highly specific not only for the 
identity but als o the modification state of their targets; 
for a review see \Papin et al\ ( 2005f ) . 

Despite the seeming overlap between the domains of 
protein-protein network and signaling networks, the fo- 
cus of the analysis is substantially different. The inter- 
actome is simply a set of possible protein-protein inter- 
actions and thus a topological (or connectivity) map; in 
contrast, signaling networks aim to capture signal trans- 
duction and therefore need to establish a causal map, in 
which the nature of the protein-protein interaction, its 
direction and timescale, and its quantitative effect on the 
activity of the tar get protein ma tter. As an example, see 
this discussion by I Kold\ (2000) on the role of protein- 
protein interactions in MAPK signaling cascade. 

Experiments on some signaling systems, such as the 
Escherichia coli chemotactic module, have generated 
enough experimental data to require detailed models in 
the form of dynamical equations. Molecular processes in 
a signaling cascade extend over different time scales, from 
milliseconds required for kinase and phosphatase reac- 
tions and protein conformational changes, to minutes or 
more required for gene expression control, cell movement 
and receptor trafficking; this fact, along with the (often 



essential) spatial effects such as the localization of signal- 
ing machinery and diffusion of chemical messengers, can 
considerably complicate analyses and simulations. 

Signaling networks are often factored into pathways 
that have specific inputs, such as the ligands of the G 
protein coupled receptors on the cell surface, and spe- 
cific outputs, as with pathways that couple to the tran- 
scriptional regulation apparatus or to changes in the 
intracellular concentration of messengers such as cal- 
cium or cyclic nucleotides. Nodes in signaling net- 
works can participate in several pathways simultaneously, 
thus enabling signal integration or potentially inducing 
damaging "crosstalk" between pathways; how junctions 
and nodes process s ignals is an area of active research 
{Jordan et aZ.Ll200Cf) . 

The components of signaling networks have long been 
the focus of biochemical research, and genetic methods 
allow experiments to assess the impact of knocking out or 
over-expression particular components. In addition, sev- 
eral experimental approaches are being designed specif- 
ically for elucidating signaling networks. Ab-chips lo- 
calize various signaling proteins on chips reminiscent of 
DNA microarrays, and stain them with appropriate fluo- 
rescent antibodies { Nielsen et al .1. 120031) . Multicolor flow 
cytometry is performed on cells immuno-stained for sig- 
naling protein modifications and hundreds of single cell 
simultaneous measurements of the modification s tate o f 
pathway nodes are collected ( Perez and Nolanl . I2002T) . 
Indirect inference of signaling pathways is also possible 
from genomic or proteomic data. 

One well studied signal transduction system is the 
mitogen activated protein kinase (MAPK) cascade that 
controls, amon g other functions, cell p roliferation and 
differentiation 1 Chang and Karim , l200lh . Because this 
system is present in all eukaryotes and its structural 
components are used in multiple pathways, it has been 
chosen as a paradigm for the study of specificity and 
crosstalk. Similarly, the TOR system, identified initially 
in yeast, is responsible for integrating the information on 
nutrient availability, growth factors and energy status of 
the cell and correspond ingly regulating the cell growth 
( Martin and HalA . 120051 ). Another interesting example 
of signal integration and both intra- and inter-cellular 
communication is observed in the quorum sensing circuit 
of the bacterium Vibrio harveyi, where different kinds 
of species- and genus-specific signaling molecules are de- 
tected by their cognate receptors on the cell surface, and 
the information is fed into a common Lux phosphorelay 
pathw ay which ultimately regula tes the quorum sensing 
genes ( Waters and BassleAWMb . 



III. MODELS OF BIOLOGICAL NETWORKS 

A. Topological models 

The structural features of a network are captured by its 
connectivity graph, where interactions (reactions, struc- 
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tural relations) are depicted as the links between the in- 
teracting nodes (genes, proteins, metabolites). Informa- 
tion about connectivity clearly cannot and does not de- 
scribe the network behavior, but it might influence and 
constrain it in revealing ways, similar to effect that the 
topology of the lattice has on the statistical mechanics of 
systems living on it. 

Theorists have studied extensively the properties of 
regular networks and random graphs starting with Erdos 
and Renyi in 1960s. The first ones are characterized 
by high symmetry inherent in a square, triangular, or 
all-to-all (mean field) lattice; the random graphs were 
without such regularity, constructed simply by distribut- 
ing K links at random between N nodes. The simple 
one-point statistical characterization that distinguishes 
random from regular networks looks at the node degree, 
that is the probability P(k) that any node has k incoming 
and/or outgoing links. For random graphs this distribu- 
tion is Poisson, meaning that most of the nodes have de- 
grees very close to the mean, (k) — J2k kP(k), although 
there are fluctuations; for regular lattices every node has 
the same connectivity to its neighbors. 

The first analyses of the early reconstructions of large 
metabolic networks revealed a surprising "scale free" 
node degree distribution, that is P(k) ~ fc~ 7 , with 7 
between 2 and 3 for most networks. For the physics 
community, which had seen the impact of such scale in- 
variance on our understanding of phase transitions, these 
observations were extremely suggestive. It should be em- 
phasized that for many problems in areas as diverse as 
quantum field theory, statistical mechanics and dynam- 
ical systems, such scaling relations are much more than 
curiosities. Power laws relating various experimentally 
observable quantities are exact (at least in some limit), 
and the exponents (here, 7) really contain everything one 
might want to know about the nature of order in the 
system. Further, some of the first thoughts on scaling 
emerged from phcnomenological analyses of real data. 
Thus, the large body of work on scaling ideas in theo- 
retical physics set the stage for people to be excited by 
the experimental observation of power laws in much more 
complex systems, although it is not clear to us whether 
the implied promise of connection to a deeper theoretical 
structure h as been fu l filled. For divergent view s on these 
matters see \Barabdsl ([20021 ) a,nd \ Kellerl (|2005[ ). 

The most immediate practical consequence of a scale 
free degree distribution is that — relative to expecta- 
tions based on random graphs — there will be an over- 
representation of nodes with very large numbers of links, 
as with pyruvate or coenzyme A in metabo lic networks 
\Jeona et all l2000t I Waaner and Fell l200ll ). These are 



meaning. On the theoretical side, removal of a sizeable 
fraction of nodes from a scale free network will neither 
increase the network diameter much, nor partitio n the 
network into equally sized parts ^Albert et aLl . l200"ol ). and 
it is tempting to think that this robustness is also biolog- 
ically significant. The scale free property has been ob- 
served in many non-biological contexts, such as the topol- 
ogy of social interactions, Worl d Wide Web link s, electri- 
cal power grid connectivity ... ( Strooatz\ , l200lh . A num- 
ber of models have been proposed for how such scaling 
might arise, and some of these ideas, such as growth by 
preferential attachment, have a vaguely biological flavo r 
(Barabasi and Albert Il999t \Barabasi and Oltvai 12004 . 
Finding the properties of networks that actually discrim- 
inate among different mechanisms o f evolution or gro wth 
turns out to be surprisingly subtle { Ziv et al\ , l2005af ). 

Two other revealing measures are regularly computed 
for biological networks. The mean path length, (I), is the 
shortest path between a pair of nodes, averaged over all 
pairs in the graph, and measures the network's overall 
'navigability.' Intuitively, short path lengths correspond 
to, for example, efficient or fast flow of information and 
energy in signaling or metabolic networks, quick spread 
of diseases in a social network and so on. The clustering 
coefficient of a node i is defined as Cj = 2rii/ki(ki — 1), 
where n% is the number of links connecting the fcj neigh- 
bors of node i to each other; equivalently, C, is the ra- 
tio between the number of triangles passing through two 
neighbors of i and node i itself, divided by the maximum 
possible number of such triangles. Random networks 
have low path lengths and low clustering coefficients, 
whereas regular la ttices have long pat h lengt hs and are 
locally clustered. I Watts and Stroaatz] (fl998) have con- 
structed an intermediate regime of "small world" net- 
works, where the regular lattice has been perturbed by 
a small number of random links connecting distant parts 
of the network together. These networks, although not 
necessarily scale free, have short path lengths and high 
clustering coefficients, a property that was subsequently 
obser ved in metabolic and o ther biological networks as 
well {[Waaner and Fe»ll200ll) . 

A high clustering coefficient suggests the existence of 
densely connected groups of nodes within a network, 
which seems contradictory to the idea of scale invari- 
ance, in which ther e is no inherent group or cluster size; 
\Ravasz et all ( 20021 ) addressed this problem by introduc- 



sometimes called hubs, although another consequence of 
a scale free distribution is that there is no 'critical de- 
gree of connection' that distinguishes hubs from non- 
hubs. In the protein-protein interaction network of Sac- 
charomyces cerevisiae, nodes with hi gher degree are mor e 
likely to represent essential proteins ( Jeona et all [2001). 
suggesting that node degree does have some biological 



ing hierarchical networks and providing a simple con- 
struction for synthetic hierarchical networks exhibiting 
both scale free and clustering behaviors. Although there 
is no unique scale for the clusters, clusters will appear at 
any scale one chooses to look at, and this is revealed by 
the scaling of clustering coefficient C(k) with the node de- 
gree k, C(k) ~ fc _1 , on both synthetic as well as natural 
metab olic networks of org anisms from different domains 
of life \Ravasz et aZ.1 . 120021 ). Another interesting property 
of some biological networks is an anticorrelation of nod e 
degree of connected nodes ( Maslov and Snevven\ , 12002]) . 
which we can think of as a 'dissociative' structure, in con- 
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trast, for example, with the associative character of social 
networks, where well connected people usually know one 
another. 

As we look more finely at the structure of the graph 
representing a network, there is of course a much 
greater variety o f thin gs to look at. For example, 
\Spirin and Mirn~i\ (|2003l ) have focused on high clustering 
coefficients as a starting point and devised algorithms 
to search for modules, or densely connected subgraphs 
within the yeast protein-protein interaction network. Al- 
though the problem has combinatorial complexity in gen- 
eral, the authors found about 50 modules (of 5-10 pro- 
teins in size, some of which were unknown at the time) 
that come in two types: the first represents dynamic func- 
tional units (e.g. signaling cascades), and the second 
protein complexe s. A similar conclusion was reached by 
Iff an et all (|2004 ). after having analyzed the interactome 
in combination with the temporal gene expression pro- 
files and protein localization data; the authors argue that 
nodes of high degree can sit either at the centers of mod- 
ules, which are simultaneously expressed ("party hubs"), 
or they can be involved in various pathways and modules 
at different times ("date hubs"). The former kind is at 
a lower level of organization, whereas the latter tie the 
network into one large connected component. 

Foc using on even a smaller scale, \Shen-Orr et all 
(2002) have explored motifs, or patterns of connectiv- 
ity of small sets of nodes that are over represented in a 
given network compared to the randomized networks of 
the same degree distribution P(k). In the transcriptional 
network of the bacterium E. coli, three such motifs were 
found: feed forward loops (in which gene X regulates 
Y that regulates Z, but X directly regulates Z as well), 
single input modules (where gene X regulates a large 
number of other genes in the same way and usually au- 
torcgulatcs itself) , and dense overlapping rcgulons (layers 
of overlapping interactions between genes and a group 
of transcription factors, much denser than in random- 
ized networks). The motif approach has been extended 
to combined network of tran scriptional regulati o n and 



protein-protein interactions ( Yeaer-Lotem et all . 20041) 



in yeast, as well as to other systems {Milo et aZfl2004 ). 

At the risk of being overly pessimistic, we should con- 
clude this section with a note of caution. It would be 
attractive to think that a decade of work on network 
topology has resulted in a coherent picture, perhaps of 
the following form: on the smallest scale, the nodes of 
biological networks are assembled into motifs, these in 
turn are linked into modules, and this continues in a hi- 
erarchical fashion until the entire network is scale free. 
As we will discuss again in the context of design princi- 
ples, the notion of such discrete substructure — motifs and 
modules — is intuitively appealing, and some discussions 
suggest that it is essential either for the function or the 
evolution of networks. On the other hand, the evidence 
for such structure usually is gathered with reference to 
some null model (e.g., a random network with the same 
P(k)), so we don't even have an absolute definition of 



these structures, much less a measure of their sufficiency 
as a characterization of the whole syste m; for attempts at 
an a bsolute definition of modulari ty see \Ziv et all ( 2005bh 
and \Hofman and Wigging ( 20071 ). Similarly, while it is 
appealing to think about scale free networks, the evi- 
dence for scaling almost always is confined to less than 
two decades, and in practice scaling often is not exact. It 
is then not clear whether the idealization of scale invari- 
ance captures the essential structure in these systems. 



B. Boolean networks 

A straightforward extension of the topological picture 
that also permits the study of network dynamics assumes 
that the entities at the nodes — for example, genes or 
signaling proteins — are either 'on' or 'off' at each mo- 
ment of time, so that for node i the state at time t is 
(Ti(t) G {0, 1}. Time is usually discretized, and an addi- 
tional prescription is needed to implement the evolution 
of the system: <7j(t+l) = /^({c^ (£)}), where fa is a func- 
tion the specifies how the states of the nodes p, that are 
the inputs to node i in the interaction graph combine 
to determine the next state at node i. For instance, j a 
might be a Boolean function for gene A, which needs to 
have its activator gene B resent and repressor gene C 
absent, so that o"^(t+ 1) = <tb(£) A a- c {t). Alternatively, 
/ might be a function that sums the inputs at state t 
with some weights, and then sets Ui = 1(0) if the result 
is above (below) a threshold, as in classical models of 
neural networks. 

Boolean networks are amenable both to analytical 
treatm ent and to efficient simulation. Early on I Kauffm an\ 
(1969) considered the family of random boolean net- 
works. In these models, each node is connected at ran- 
dom to K other nodes on average, and it computes a 
random Boolean function of its inputs in which a frac- 
tion p of the 2 K possible inputs combinations leads to 
(Ti(t + 1) = 1. In the limit that the network is large, the 
dynamics are either regular (settling into a stable fixed 
cycle) or chaotic, and these two phases are separated by 
a separatrix 2p(l — p )K = 1 in the phase space (p, K). 

VAld ana and CluzeA (|2003l ) have shown that for connec- 
tivities of K ~ 20 that could reasonably be expected 
in e.g. transcriptional regulatory networks, the chaotic 



regime dominates the phase space. They point out, how- 
ever, that if the network is scale free, there is no 'typical' 
K as the distribution P{k) ~ k 1 does not have a well- 
defined mean for 7 < 3 and the phase transition criterion 
must be restated. It turns out, surprisingly, that regular 
behavior is possible for values of 7 between 2 and 2.5, ob- 
served in most biological networks, and this is exactly the 
region where the separatrix lies. Scale free architecture, 
at least for Boolean networks, seems to prevent chaos. 

Several groups have used Boolean model s to look at 
specific biological systems. I Thoma^ ( 1973[ ) has estab- 
lished a theoretical framework in which current states of 
the genes (as well as the states in the immediate past) 
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and the environmental inputs are represented by Boolean 
variables that evolve through the application of Boolean 
functions. This work has bee n continued by, for example, 
\Sanchez and Thieffni (|200ll ). who analyzed the gap-gene 
system of the fruit fly Drosophila by building a Boolean 
network that generates the correct levels of gene expres- 
sion for 4 gap genes in response to input levels of 3 mater- 
nal morphogens with spatially varying profiles stretched 
along the anterior-posterior axis of the fly embryo. Inter- 
estingly, to reproduce the observed results and correctly 
predict the known Drosophila segmentation mutants, the 
authors had to introduce generalized Boolean variables 
that can take more than two states, and have identified 
the smallest necessary number of such states for each 
gene. 

In a similar spirit . [Zi et all ( 2004 ) studied the skeleton 
of the budding yeast cell cycle, composed of 11 nodes, and 
a thresholding update rule. They found that the topology 
of this small network generates a robust sequence of tran- 
sitions corresponding to known progression through yeast 
cell-cycle phases Gl (growth), S (DNA duplication), G2 
(pause) and M (division), triggered by a known 'cell-size 
checkpoint.' This progression is robust, in the sense that 
the correct trajectory is the biggest dynamical attractor 
of the system, with respect to various choices of update 
rules and parameters, small changes in network topology, 
and choice of triggering checkpoints. 

The usefulness of Boolean networks stems from the 
relative ease of implementation and simple parametriza- 
tions of network topology and dynamics, making them 
suitable for studying medium or large networks. In ad- 
dition to simplifying the states at the nodes to two (or 
more) discrete levels, which is an assumption that has 
not been clearly explored, one should be cautious that 
the discrete and usually synchronous dynamics in time 
can induce unwanted artifacts. 



C. Probabilistic models 

Suppose one is able to observe simultaneously the ac- 
tivity levels of several proteins comprising a signaling net- 
work, or the expression levels of a set of genes belonging 
to the same regulatory module. Because they are part of 
a functional whole, the activity levels of the components 
will be correlated. Naively, one could build a network 
model by simply computing pairwise correlation coeffi- 
cients between pairs of nodes, and postulating an inter- 
action, and therefore a link, between the two nodes when- 
ever their correlation is above some threshold. However, 
in a test case where A — > B — > C (gene A induces B which 
induces C), one expects to see high positive correlation 
among all three elements, even though there is no (phys- 
ical) interaction between A and C. Correlation therefore 
is not equal to interaction or causation. Constructing 
a network from the correlations in this naive way also 
does not lead to a generative model that would predict 
the probabilities for observing different states of the net- 



work as a whole. Another ap proach is clearly needed; see 
\Markowetz and Svand ( 2007t ) for a review. 

In the simple case where the activity of a protein/gene 
i can either be 'on' (<jj = 1) or 'off' (cr, = 0), the state of 
a network with N nodes will be characterized by a binary 
word of N bits, and because of interaction between nodes, 
not all these words will be equally likely. For example, 
if node A represses node B, then combinations such as 
IaOb ■ • • or Ojilfl • • • will be more likely than Ia^b • ■ • In 
the case of deterministic Boolean networks, having node 
A be 'on' would imply that node B is 'off' with certainty, 
but in probabilistic models it only means that there is 
a positive bias for node B to be 'off', quantified by the 
probability that node B is 'off' given that the state of 
node A is known. Having this additional probabilistic 
degree of freedom is advantageous, both because the net- 
work itself might be noisy, and because the experiment 
can induce errors in the signal readout, making the infer- 
ence of deterministic rules from observed binary patterns 
an ill-posed problem. 

Once we agree to make a probabilistic model, the 
goal is to find the distribution over all network states, 
which we can also think of as the joint distribution of 
all the N variable that live on the nodes of the network, 
P(<J\, ■ ■ ■ , ctn\C), perhaps conditioned on some fixed set 
of environmental or experimental factors C. The activi- 
ties of the nodes, <Ji, can be binary, can take on a discrete 
set of states, or be continuous, depending on our prior 
knowledge about the system and experimental and nu- 
merical constraints. Even for a modest N, experiments 
of realistic scale will not be enough to directly estimate 
the probability distribution, since even with binary vari- 
able the number of possible states, and hence the number 
of parameters required to specify the general probability 
distribution, grows as ~ 2^. Progress thus depends in 
an essential way on simplifying assumptions. 

Returning to the three gene example A — > B — > C, we 
realize that C depends on A only through B, or in other 
words, C is conditionally independent of A and hence 
no interaction should be assigned between nodes A and 
C. Thus, the joint distribution of three variables can be 
factorized, 

P(er A) <7B,£rc) = P(^c\^b)P(ub\^a)P(^a)- 

One might hope that, even in a large network, these 
sorts of conditional independence relations could be used 
to simplify our model of the probability distribution. 
In general this doesn't work, because of feedback loops 
which, in our simple example, would include the pos- 
sibility that C affects the state of A, either directly or 
through some more circuitous path. Nonetheless one can 
try to make an approximation in which loops either are 
neglected or (more sensibly) taken into account in some 
sort of average way; in statistical mechani cs, this appro x- 
imation goes back at least to the work oi \Bethe\ (1 19351) . 

In the computer science and bioinformatics literature, 
the exploitation of Bethe-like approximation s has come 
to be known as 'Bayesian network modeling' ( Friedman] , 
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12004 ). In practice what this approach does is to search 
among possible network topologies, excluding loops, and 
then for fixed topology one uses the conditional probabil- 
ity relationships to factorize the probability distribution 
and fit the tables of conditional probabilities at each node 
that will best reproduce some set of data. Networks with 
more links have more parameters, so one must introduce 
a tradeoff between the quality of the fit to the data and 
this increasing complexity. In this framework there is 
thus an explicit simplification based on conditional in- 
dependence, and an implicit simplification based on a 
preference for models with fewer links or sparse connec- 
tivity. 

The best known application of this approach to a bi- 
ological network is the analysis of the MAPK signal- 
i ng pathway i n T c ells from the human immune system 
iSachs et al. , 2005). The data for this analysis comes 
from experiments in which the phosophorylated states of 
11 proteins in the p athway are sampled sim ultaneously 
by immunostaining I Perez and NolarA . l2002f ). with hun- 
dreds of cells sampled for each set of external conditions. 
By combining experiments from multiple conditions, the 
Bayesian network analysis was able to find a network of 
interactions among the 11 proteins that has high overlap 
with those known to occur experimentally. 

A very different approach to simplification of proba- 
bilistic inpclels is_based on the maximum entropy prin- 
ciple (| Javned . Il957t ). In this approach one view a set 
of experiments as providing an estimate of some set of 
correlations, for example the ~ N 2 correlations among 
all pairs of elements in the network. One then tries 
to construct a probability distribution which matches 
these correlations but otherwise has as little structure — 
as much entropy — as possible. We recall that the Boltz- 
mann distribution for systems in thermal equilibrium can 
be derived as the distribution which has maximum en- 
tropy consistent with a given average energy, and max- 
imum entropy modeling generalizes this to take account 
of other average properties. In fact one can construct 
a hierarchy of maximum entropy distributions which are 
consistent with higher a nd higher orders of correlation 
\Schneidman et all 120031 ) . Maximum entropy models for 
binary variables that are consistent with pairwise corre- 
lations are exactly the Ising models of statistical physics, 
which opens a wealth of analytic tools and intuition 
about collective behavior in these systems. 

In the context of biological networks (broadly con- 
strued), recent work has shown that maximum entropy 
models consistent with pairwise correlations are surpris- 
ingly successful at describing the patterns of activity 
among populations of neurons i n the vertebrate re t ina as 
it responds to natu ral movies ( Schneidman et al\ , 120061 : 
\Tkacik et all . l2006h . Similar results arc obtained for very 
differ ent retinas under different conditions ( Shlens et all 
200G), and these successes have touched a flurry of inter- 
est in the analysis of neural populations more generally. 
The connection to the Ising model has a special resonance 
in the context of neural networks, where the collective 



behavior of the Ising model has been used for some time 
as a prototype for thinking abo ut the dy n amics of com- 
putation and memory storage (Hovfield,, Il982l ); in the 
maximum entropy approach the Ising model emerges di- 
rectly as the least structured model consistent with the 
experimentally measured patterns of correlation among 
pairs of cells. A particularly striking result of this anal- 
ysis is that the Ising models which emerg e seem to be 



poised near a critical point \Tkacik et al. , [2006). Re- 
turning to cell biology, the maximum entropy approach 
has also bee n used to analyze p atterns of gene expres- 



sion in yeast i Lezon et all 20061 ) as well as to revisit the 
MAPK cascade (j Tkaciki l2007l) 7 



D. Dynamical systems 

If the information about a biological system is detailed 
enough to encompass all relevant interacting molecules 
along with the associated reactions and estimated reac- 
tion rates, and the molecular noise is expected to play a 
negligible role, it is possible to describe the system with 
rate equations of chemical kinetics. An obvious bene- 
fit is the immediate availability of mathematical tools, 
such as steady state and stability analyses, insight pro- 
vided by nonlinear dynamics and chaos theory, well de- 
veloped numerical algorithms for integration in time and 
convenient visualization with phase portraits or bifurca- 
tion diagrams. Moreover, analytical approximations can 
be often exploited productively when warranted by some 
prior knowledge, for example, in separately treating 'fast' 
and 'slow' reactions. In practice, however, reaction rates 
and other important parameters are often unknown or 
known only up to order-of-magnitude estimations; in this 
case the problem usually reduces to the identification of 
phase space regions where the behavior of the system is 
qualitatively the same, for example, regions where the 
system exhibits limit-cycle oscillations, bis tability, con- 
vergen ce into a single steady state etc.; see I Troon et al\ 
(2001) for a review. Despite the difficulties, deterministic 
chemical kinetic models have been very powerful tools in 
analyzing specific network motifs or regulatory elements, 
as in the protein signaling circuits that achieve perfect 
ad aptation, homeosta tsis, switching and so on described 
by I Tyson et~ai\ (|2003l) . and more generally in the analy- 
sis of transc r iption al regulatory networks as reviewed by 
\Hastv et al\ (|200lD . 

In the world of bacteria, some of the first detailed com- 
puter simulation of the che motaxis mo d ule of Escherichia 
coli were undertaken by \Brav et~al. The sig- 



naling cascade from the Tar receptor at the cell surface 
to the modifications in the phosphorylation state of the 
molecular motor were captured by Michaelis-Menten ki- 
netic reactions (and equilibrium binding conditions for 
the receptor), and the system of equations was numeri- 
cally integrated in time. While slow adaptation kinetics 
was not studied by in this first effort, the model never- 
theless qualitatively reproduces about 80 percent of ex- 
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amined chemotactic protein deletion and overexpression 
mutants, although the extreme sensitivity of the system 
remained unexplained. 

In eukaryotes, \Novak and Tuson] (1997]) have, for in- 
stance, constructed an extensive model of cell cycle con- 
trol in fission yeast. Despite its complexity (~ 10 proteins 
and ~ 30 rate constants) , Novak and colleagues have pro- 
vided an interpretation of the system in terms of three 
interlocking modules that regulate the transitions from 
Gl (growth) into S (DNA synthesis) phase, from G2 into 
M (division) phase, and the exit from mitosis, respec- 
tively. The modules are coupled through cdc2/cdcl3 pro- 
tein complex and the system is driven by the interaction 
with the cell size signal (proportional to the number of 
ribosomes per nucleus) . At small size, the control circuit 
can only support one stable attractor, which is the state 
with low cdc2 activity corresponding to Gl phase. As 
the cell grows, new stable state appears and the system 
makes an irreversible transition into S/G2 at a bifurca- 
tion point, and, at an even larger size, the mitotic module 
becomes unstable and executes limit cycles in cdc2-cdcl3 
activity until the M phase is completed and the cell re- 
turns to its initial size. The basic idea is that the cell, 
driven by the the size readout, progresses through ro- 
bust cell states created by bistability in the three mod- 
ules comprising the cell cycle control - in this way, once 
it commits to a transition from G2 state into M, small 
fluctuations will not flip it back into G2. The mathe- 
matical model has in this case successfully predicted the 
behaviors of a number of cell cycle mutants and recapit- 
ulated experimental observations colle cted during 1 970s 
and 1980s by Nurse and collaborators ( Nurse\ . [200 ll ). 

The circadian clock is a naturally occurring transcrip- 
tional module that is particularly amenable to dy nami- 
cal systems modeling. \Lelouv and GoldbeteA (|2003l ) have 
created a mathematical model of a mammalian clock 
(with ~ 20 rate equations) that exhibits autonomous sus- 
tained oscillations over a sizable range of parameter val- 
ues, and reproduces the entrainment of the oscillations 
to the light-dark cycles through light-induced gene ex- 
pression. The basic mechanism that enables the cyclic 
behavior is negative feedback transcriptional control, al- 
though the actual circuit contains at least two coupled 
oscillators. Studying circadian clock in mammals, the 
fruit fly Drosophila or Neurospora is attractive because 
of the possibility of connecting a sizable catalogue of 
physiological disorders in circadian rhythms to malfunc- 
tions in the clock circuit and direct exper imentation with 
light-dark stimuli ( Young and Kau\ . \200l\) . Recent exper- 
iments indicate that at least in cyanobacteria the circdian 
clock can be reconstituted from a surprisingly small set 
of bioc hemical reaction s , without transcription or trans- 
lation \Nakaiima et all 120051 : \ Tomita et al. , 2005), and 
this opens possibilities for even simpler and highly pre- 
dictive dynamical models ( Rust et all , 120071 ). 

Dynamical modeling has in addition been applied to 
many smaller systems. Fo r example, the const ruction of 
a synthetic toggle switch ( Gardner et all |2000| ), and the 



'repressilator' - oscillating network of thr ee mutually re- 
pressing genes { Elowitz and Leibler\\2QQ$ i - are examples 
where mathematical analysis has stimulated the design of 
synthetic circuits. A successful reaction-diffusion model 
of how localization and complex formation of Min pro- 
teins can lead to spatial limit cycle oscillations (used by 
Es cherichia coli to fin d its division site) was constructed 
bv \Huana et~al\ (|2003l ). It remains a challenge, neverthe- 
less, to navigate in the space of parameters as it becomes 
ever larger for bigger networks, to correctly account for 
localization and count various forms of protein modifica- 
tions, especially when the signaling networks also couple 
to transcriptional regulation, and to find a proper bal- 
ance between models that capture all known reactions 
and interactions and phenomenological models that in- 
clude coarse-grained variables. 



E. Stochastic dynamics 

Stochastic dynamics is in principle the most detailed 
level of system description. Here, the (integer) count 
of every molecular species is tracked and reactions are 
drawn at random with appropriate probabilities per unit 
time (proportional to their respective reaction rates) and 
executed to update the current tally of molecular counts. 
An algorithm implementing this prescription, called the 
stochasti c simu latio n algorithm or S SA, was devised by 
\Gillesvie\ (ll977t) : see \Gillesvie\ (120071) for a review of SSA 
and a discussion of related methods. Although slow, this 
approach simulating chemical reactions can be made ex- 
act. In general, when all molecules are present in large 
numbers and continuous, well-mixed concentrations are 
good approximations, the (deterministic) rate dynamics 
equations and stochastic simulation give the same re- 
sults; however, when molecular counts are low and, con- 
sequently, the stochasticity in reaction timing and order- 
ing becomes important, the rate dynamics breaks down 
and SSA needs to be used. In biological networks and 
specifically in transcriptional regulation, a gene and its 
promoter region are only present in one (or perhaps a 
few) copies, while transcription factors that regulate it 
can also be at nanomolar concentrations (i.e. from a few 
to a few hundred molecules per nuc leus), making stochas- 
tic ef f ects p ossibly very important ( McAdams and ArkirA , 
119971 . Il999h . 

One of the pioneering studies of the role of noise in a 
biological system w as a simulat i on of the phage A lysis- 
lysogeny switch by \Arkin et ali ( 19981 ) . The lifecycle of 
the phage is determined by the concentrations of two 
transcription factors, cl (lambda repressor) and cro, that 
compete for binding to the same operator on the DNA. 
If cl is prevalent, the phage DNA is integrated into the 
host's genome and no phage genes except for cl are 
expressed (the lysogenic state); if cro is dominant, the 
phage is in lytic state, using cell's DNA replication ma- 
chinery t o produce more phages and ultimately lyse the 
host cell (|Ptasfend . l2003 ). The switch is bistable and the 
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fate of the phage depends on the temporal and random 
pattern of gene expression of two mutually antagonistic 
transcription factors, although the balance can be shifted 
by subjecting the host cell to stress and thus flipping the 
toggle into lytic phase. The stochastic simulation cor- 
rectly reproduces the experimentally observed fraction of 
lysogenic phages as a function of multiplicity-of-infection. 
An extension of SSA to spatially extended models is pos- 
sible. 

Although the simulations are exact, they are compu- 
tationally intensive and do not offer any analytical in- 
sight into the behavior of the solutions. As a result, 
various theoretical techniques have been developed for 
studying the effects of stochasticity in biological net- 
works. These are often operating in a regime where the 
deterministic chemical kinetics is a good approximation, 
and noise (i.e. fluctuation of concentrations around the 
mean) is added into the system of differential equations as 
a perturbation; these Langevin methods have been use- 
ful for the study o f nois e propagation in regulatory net- 
work s iPaulssoA, 120041: I Thattai and van OudenaardeA , 
2001; \van Kamvem . l2007j ) . The analysis of stochastic dy- 
namics is especially interesting in the context of design 
principles which consider the reliability of network func- 
tion, to which we return below. 



IV. NETWORK PROPERTIES AND OPERATING 
PRINCIPLES 

A. Modularity 

Biological networks are said to be modular, although 
the term has several related but nevertheless distinct 
meanings. Their common denominator is the idea that 
there exist a partitioning of the network nodes into 
groups, or modules, that are largely independent of each 
other and perform separate or autonomous functions. In- 
dependence can be achieved through spatial isolation of 
the module's processes or by chemical specificity of its 
components. The ability to extract the module from the 
cell and reconstitute it in vitro, or transplant it to another 
type of cell is a powerful argum ent for the existence of 
modularity ( Hartwell et a/.l . ll999l ). In the absence of such 
strong and laborious experimental verifications, however, 
measures of modularity that depend on a particular net- 
work model are frequently used. 

In topological networks, the focus is on the module's 
independence: nodes within a module are densely con- 
nected to each other, while inter-modular links are sparse 
\Han et all , 120041 : iRavasz et all , \200&\Svirin and Mirny] , 
I2003D and the tendency to cluster is measured by high 
clustering coefficients. As a caveat to this view note 
that despite their sparseness the inter-module links could 
represent strong dynamical couplings. Modular ar- 
chitecture has been stu died in Boolean networks by 
\Kashtan and Alon\ (|2005[ ). who have shown that mod- 
ularity can evolve by mutation and selection in a time- 



varying fitness landscape where changeable goals decom- 
pose into a set of fixed subproblcms. In the example stud- 
ied they computationally evolve networks implementing 
several Boolean formulae and observe the appearance of 
a module - a circuit of logical gates implementing a par- 
ticular Boolean operator (like XOR) in a reusable way. 
This work makes clear that modularity in networks is 
plausibly connected to modularity in the kinds of prob- 
lems that these networks were selected to solve, but we 
really know relatively little about the formal structure of 
these problems. 

There are also ways of inferring a form of modularity 
directly without assuming any particular network model. 
Clustering tools partition genes into co-expressed groups, 
or cl usters, that are often identified with particular mod- 
ules flEisen et all Il998t \Seaal et all 120031: lSl onim et all 
120051 ). \lhmels et aZ.I ( 2002 ) have noted that each node can 
belong to more than one module depending on the biolog- 
ical state of the cell, or the context, and have correspond- 
i ngly r eexamined the clustering problem. \Elemento et all 
(2007) have recently presented a general information the- 
oretic approach to inferring regulatory modules and the 
associated transcription factor binding sites from various 
kinds of high-throughput data. While clustering meth- 
ods have been widely applied in the exploration of gene 
expression, it should be emphasized that merely finding 
clusters does not by itself provide evidence for modu- 
larity. As noted above, the whole discussion would be 
much more satisfying if we had independent definitions 
of modularity and, we might add, clearly stated alter- 
native hypotheses about the structure and dynamics of 
these networks. 

Focusing on the functional aspect of the module, we 
often observe that the majority of the components of a 
system (for instance, a set of promoter sites or a set of 
genes regulating motility in bacteria) are conserved to- 
gether across species. These observations support the hy- 
pothesis that the conserved components are part of a very 
tightly coupled sub-network which we might identify as a 
module. Bioinformatic tools can then use the combined 
sequence and expression data to give p redictions about 
modules, as reviewed by \Siaai a\ (|2005l) . Purely phylo- 



genetic approaches that infer module components based 
on inter-species comparisons have also been productive 
and can extract candidate modules based only on phy- 
logenetic footprinting, that is, studying the presence or 
absence of homologous genes across organisms and cor- 
rclatin g their presence wit h hand annotated phenotypic 
traits ( Slonim et aLl . 120061 ) . 



B. Robustness 



Robustness refers to a property of the biological net- 
work such that some aspect of its function is not sen- 
sitive to perturbations of network parameters, environ- 
mental variable s (e.g. temperature), or initial state; see 
\de Visser et al. (2003) for a review of robustness from an 
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evolutionary perspective and I GouliaA (2004) for mech- 
anisms of robustness in bacterial circuits. Robustness 
encompasses two very different ideas. One idea has to 
do with a general principle about the nature of expla- 
nation in the quantitative sciences: qualitatively striking 
facts should not depend on the fine tuning of parame- 
ters, because such a scenario just shifts the problem to 
understanding why the parameters are tuned as they are. 
The second idea is more intrinsic to the function of the 
system, and entails the hypothesis that cells cannot rely 
on precisely reproducible parameters or conditions and 
must nonetheless function reliably and reproducibly. 

Robustness has been studied extensively in the chemo- 
tactic system of the bacterium Escherichia coli. The 
systematic bias to swim towards chemoattractants and 
away from repellents can only be sustained if the bac- 
terium is sensitive to the spatial gradients of the concen- 
tration and not to its absolute levels. This discrimina- 
tive ability is ensured by the mechanism of perfect adap- 
tation, with which the proportion of bacterial straight 
runs and tumbles (random changes in direction) always 
returns to t h e sam e value in the absence of gradients 
( Block et all [l983). Naively, however, the ability to 
adapt perfectly seems to be sensitive to the amounts 
of intracellular signaling proteins, which can be tuned 
only approximately by means o f transcriptional regula- 
tion. \Barkai and LeibleA (|l997f) argued that there is in- 
tegral feedback control in the chemotactic circuit which 
makes it robust ag ainst changes in these parameters, and 
\Alon et a/1 ( 19991 ) showed experimentally that precision 
of adaptation truly stays robust, while other properties 
of the systems (such as the time to adapt and the steady 
state) show marked variations with intracellular signaling 
protein concentrations. 

One seemingly clear example of robust biological func- 
tion is embryonic development. We know that the spa- 
tial structure of the fully developed organism follow a 
'blueprint' laid out early in dev elopment as a spat i al pat - 
tern of gene expression levels, won Dassow et al. I (I2000D 
studied one part of this process in the fruit fly Drosophila, 
the 'segment polarity network' that generates striped 
patterns of expression. They considered a dynamical sys- 
tem based on the wiring diagram of interactions among 
a smal group of genes and signaling molecules, with 
~ 50 associated constants parametrizing production and 
degradation rates, saturation response and diffusion, and 
searched the parameter space for solutions that repro- 
duce the known striped patterns. They found that, with 
their initial guess at network topology, such solutions do 
not exist, but adding a particular link - biologically mo- 
tivated though unconfirmed at the time - allowed them 
to find solutions by random sampling of parameter space. 
Although they presented no rigorous measure for the vol- 
ume of parameter space in which correct solutions exist, 
it seems that a wide variety of parameter choices and 
initial conditions indeed produce striped expression pat- 
terns, and this was taken to be a signature of robustness. 

Robustness in dynamical models is the ability of the 



biological network to sustain its trajectory through state 
space despite parameter or state perturbations. In circa- 
dian clocks the oscillations have to be robust against both 
molecular noise inherent in transcrip tional regulat i on, ex - 
amined in stochastic simulations by I Gonze et al. f2002). 
as w ell as variation in rate parameters ( Stellina et ali . 
I2004T) ; in the latter work the authors introduce integral 
robustness measures along the trajectory in state space 
and argue that the clock network architecture tends to 
concentrate the fragility to perturbations into parame- 
ters that arc global to the cell (maximum overall transla- 
tion and protein degradation rates) while increasing the 
robustness to processes specific to the circadian oscilla- 
tor. As was mentioned earlier, r obustness to sta te per- 
turbations was demonstrated by \Li et ali (|2004f ) in the 
threshold binary network model of the yeast cell cycle, 
and examined in scale-fre e random Boolean networks by 
\Aldana and CluzeA ([20031 ). 



As with modularity, robustness has been somewhat re- 
sistant to rigorous definitions. Importantly, robustness 
has always been used as a relational concept: function 
X is robust to variations in Y. An alternative to ro- 
bustness is for the organism to exert precise control over 
Y, perhaps even using A as a feedback signal. This 
seems to be how neurons stabilize a functional mix of dif- 



ferent ion channels \Marder and Bucher . 2006), follow- 
i ng th e original theoretical suggestion of LeMasson et all 
(1993). Pattern formation during embryonic develop- 
ment in Drosophila begins with spatial gradients of tran- 
scription factors, such as Bicoid, which are established 
by maternal gene expression, and it has been assumed 
that variations in these expression levels are inevitable, 
requiring some robust readout mechanism. Recent mea- 
surements of Bicoid in live embryos, however, demon- 
strate that the absolute concentrations are actually re- 
produ cible from embryo t o embryo with ~ 10% preci- 



Greaor et ali l2007al ). While there remain many 



sion 

open questions, these results suggest that organisms may 
be able to exert surprisingly exact control over critical 
parameters, rather than having compensation schemes 
for initially sloppy mechanisms. The example of ion 
channels alerts us to the possibility that cells may even 
'know' which combinations of parameters are critical, so 
that variations in a multidimensional parameter space 
are large, but confined to a low dimensional manifold. 



C. Noise 

A dynamical system with constant reaction rates, 
starting repeatedly from the same initial condition in a 
stable environment, always follows a deterministic time 
evolution. When the concentrations of the reacting 
species are low enough, however, the description in terms 
of time (and possibly space) dependent concentration 
breaks down, and the stochasticity in reactions, driven 
by random encounters between individual molecules, be- 
comes important: on repeated trials from the same ini- 
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tial conditions, the system will trace out different tra- 
jectories in the state space. As has been pointed out in 
the section on stochastic dynamics, biological networks in 
this re gime need to be simulated with the Gillespie algo- 
rithm ( Gillesvie] , Il97?t ). or analyzed within approximate 
schemes that treat noise as perturbation of deterministic 
dynamics. Recent experimental developments have made 
it possible to observe this noise directly, spurring new re- 
search in the field. Noise in biological networks funda- 
mentally limits the organism's ability to sense, process 
and respond to environmental and internal signals, sug- 
gesting that analysis of noise is a crucial component in 
any attempt to understand the design of these networks. 
This line of reas oning is well developed in the context of 
neural function and we draw attention in 

particular to work on the ability of the visual system to 
count single photons, which depends upon the precision 
of the G-protein mediate d signaling cascade i n pho tore- 
ceptors; see, for example. \Ramanathan et all (|2005l ). 

Because transcriptional regulation inherently deals 
with molecules, such as DNA and transcription factors, 
that are present at low copy numbers, most noise studies 
were carried out on transcriptional regulatory elements. 
The availability of fluorescent proteins and their fusions 
to wild type proteins have been the crucial tools, enabling 
researchers to image the cells expressing these probes in a 
controllable manner, and track their number i n tim e and 
across the population of cells. \Elowitz et all ( 2002f) pio- 
neered the idea of observing the output of two identical 
regulatory elements driving the expression of two fluores- 
cent proteins of different colors, regulated by a common 
input in a single Escherichia coli cell. In this 'two-color 
experiment,' the correlated fluctuations in both colors 
must be due to the extrinsic fluctuations in the common 
factors that influence the production of both proteins, 
such as overall RNA polymerase or transcription factor 
levels; on the other hand, the remaining, uncorrelated 
fluctuation is due to the intrinsic stochasticity in the 
transcription of the gene and translation of the messen- 
ger RNA in to the fluorescent pr ote in from each of t he two 
promoters ( Swain et all 120021 ). \Ozbudak et al. (2002) 
have studied the contributions of stochasticity in tran- 
scription and translation to the total noise in gene expres- 
sion in prok aryotes, whi\e\ Pedraza and van Oudenaardem 
( 20051 ) and \Hooshanai et al. ( 20051 ) have looked at the 
propagation of noise from transcription factors to their 
target s in synthetic multi-gene cascades. iRosenfel d et all 
(2005) have used the statistics of binomial partitioning of 
proteins during the division of Escherichia coli to convert 
their fluorescence measurements into the corresponding 
absolute protein concentrations, and also were able to 
observe the dynamics of these fluctuations, characteriz- 
ing the correlation times of both intrinsic and extrinsic 
noise. 

Theoretical work has primarily been concerned 
with disentangling and quantifying the contribu- 
tions of different steps in transcriptional regula- 
tion and gene expression to the total noise in 



the regulated gene iPaulssoii, 12004 \Swain\ . l2004t 
\Thattai and van OudenaardeA , 200ll ). often by looking 



for signatures of various noise sources in the behavior of 
the measured noise as a function of the mean expression 
level of a gene. For many of the examples studied in 
prokaryotes, noise seemed to be primarily attributable 
to the production of proteins in bursts from single mes- 
senger RNA molecules, and to pulsatile and random ac- 
tivation of genes and t herefo re bursty tra nslation into 
mRNA fGoldina et all, 120051). In yeast \Blake et all , 
200 3i; \Raser and O'SheaU 2005) and in mammalian cells 
( Raj et a/T 20061 ) such stochastic synthesis of mRNA was 
modeled and observed as well. Simple scaling of noise 
with the mean was observe d in ~ 40 yeast protein s un- 
der different conditions by \Bar-Even et all (|2006h and 
interpreted as originating in variability in mRNA copy 
numbers or gene activation. 

\Bialek and SetaveshaaA ( 2005f ) have demonstrated the- 
oretically that at low concentrations of transcriptional 
regulator, there should be a lower bound on the noise set 
by the basic physics of diffusion of transcription factor 
molecules to the DNA binding sites. This limit is in- 
dependent of (possibly complex, and usually unknown) 
molecular details of the binding process; as an example, 
cooperativity enhances the 'sensitivity' to small changes 
in concentration, b ut doesn't lower the physical l imit to 
noise performance ( Bialek and SetaveshaaA , [2006) . This 
randomness in diffusive flux of factors to their 'detectors' 
on the DNA must ultimately limit the precision and re- 
liability of transcriptional regulation, much like the ran- 
domness in diffusion of chemoattractants to the detec- 
tors on the surface of Escherichia coli limits its chemo- 
tactic performance ( Bera and PurcelA , Il977l ). Interest- 
ingly, one dimensional diffusion of transcription factors 
along the DNA can have a big impact on the speed 
with which TFs find their targets, but the change in 
noise performance that one might expect to accompany 
these kinetic changes is largely compensated by the ex- 
t ended correlation structu re of one dimensional diffusion 
Tkacik and .Bmfefcl . 120071) . Recent measurements of the 



regulation of the hunchb ack gene by Bicoid d uring early 
fruit fly development by I Greaor et all ( 2007al ) have pro- 
vided evidence for the dominant role of such input noise, 
which coexists with previously studied output noise in 
production of mRNA and protein. These results raise 
the possibility that initial decisions in embryonic devel- 
opment are made with a precision limited by fundamental 
physical principles. 



D. Dynamics, attractors, stability and large fluctuations 

The behavior of a dynamical system as the time tends 
to infinity, in response to a particular input, is interest- 
ing regardless of the nature of the network model. Both 
discrete and continuous, or deterministic and noisy, sys- 
tems can settle into a number of fixed points, exhibit 
limit-cycle oscillations, or execute chaotic dynamics. In 
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biological networks it is important to ask whether these 
qualitatively different outcomes correspond to distinct 
phenotypes or behaviors. If so, then a specific stable gene 
expression profile in a network of developmental genes, 
for example, encodes that cell's developmental fate, as 
the amount of lambda repressor encodes the state of ly- 
sis vs lysogeny switch in the phage. The history of the 
system that led to the establishment of a specific steady 
state would not matter as long as the system persisted in 
the same attractor: the dynamics could be regarded as a 
'computation' leading to the final result, the identity of 
the attractor, with the activities of genes in this steady 
state in turn driving the downs tream pathways and other 
modules; sec \Kauffman\ (|l96flh for genetic networks and 
for similar ideas in neural networks for 
associative memory. Alternatively, such partitioning into 
transient dynamics and 'meaningful' steady states might 
not be possible: the system must be analyzed as a whole 
while it moves in state space, and parts of it do not sep- 
arately and sequentially settle into their attactors. 

It seems, for example, that qualitative behavior of the 
cell cycle can be understood by progression through well- 
defined states or checkpoints: after transients die away, 
the cell cycle proteins are in a 'consistent' state that reg- 
ulates division or growth related activities, so long as the 
condi tions do not warrant a new transit ion into the next 
state ( Chen et al. |, I200CI \Nasmvth\ . Il996h . In the fruit fly 
Drosophila development it has been suggested that com- 
bined processes of diffusion and degradation first estab- 
lish steady-state spatial profiles of maternal morphogens 
along the major axis of the embryo, after which this 
stable 'coordinate system' is read out by gap and other 
downstream genes to generate the b ody se gments. Re- 
cent measurements by I Greaor et all ( 2007bl ) have shown 
that there is a rich dyna mics in the Bico i d mor phogen 
concentration, prompting I Beramann et all (|2007l ) to hy- 
pothesize that perhaps downstream genes read out and 
respond to morphogens even before the steady state has 
been reached. On another note, an interesting excitable 
motif, called the "feedback resistor," has been found in 
HIV Tat system - instead of having a bistable switch like 
the lambda phage, HIV (which lacks negative feedback 
capability) implements a circuit with a single stable 'off' 
lysogenic state, that is perturbed in a pulse of transacti- 
vation when the virus attacks. The pulse probably trig- 
gers a threshold-crossing process that drives downstream 
events, and subsequently decays away; the feedback resis- 
tor is thus again an exam ple of a dynamic, as opposed t o 
the steady-state, readout ( Weinberger and ShenkV [2007h . 
Excitable dynamics are of course at the heart of the ac- 
tion potential in neurons, which results from the cou- 
pled dynamics of ion channel proteins, and related dy- 
namical ideas are now emerging other cellular networks 
ASiielet aZ.1 . 12001 . 



flop' switches in computer chips - they form the basis of 
cellular (epigenetic) memory. While this mechanism for 
remembering the past is not unique - for example, a very 
slow, but not bistable, dynamics will also retain 'mem- 
ory' of the initial condition throu gh protein l e vels t hat 
persist on a generation time scale l\Siaal et qf J . l2006t ) . it 
has the potential to be the most stable mechanism. The 
naturally occurring bistable switch of the la mbda phage 
was st udied using stochastic simulation by \Arkin et all 
(119981) . and a syntheti c toggle switch was constructed 
in Escherichia coli by I Gardner et all (|2000l ) . Theoret- 
ical studies of systems where large fluctuations are im- 
portant are generally di fficult and res tricted to simple 
regulatory elements, but \Bialek\ ( 200 if ) has shown that 
a bistable switch can be created with as few as tens 
molecules yet remain stable for years. A full understand- 
ing of such stochastic switching brings in p owerful meth- 
ods from statistical physic s and fi e ld theory iRoma et all , 
120051 ; \Sasai and Wolvnes\ , 120031: I Walczak et all , l2005h . 



ultimately w ith the hope of c onnecting to quantitative 
experiments ( A car et "all , 120051 ). 



If attractors of the dynamical system correspond to 
distinct biological states of the organism, it is important 
to examine their stability against noise-induced sponta- 
neous flipping. Bistable elements are akin to the 'flip- 



E. Optimization principles 

If the function of a pathway or a network module can 
be quantified by a scalar measure, it is possible to ex- 
plore the space of networks that perform the given func- 
tion optimally. An example already given was that of 
maximizing the growth rate of the bacterium Escherichia 
coli, subject to the constraints imposed by the known 
metabolic reactions of the cell; the resulting optimal joint 
usage of o xygen and food cou ld be compared to the ex- 
periments ( Ibarra et aZ.I . I2002T ) . If enough constraints ex- 
ist for the problem to be well posed, and there is suffi- 
cient reason to believe that evolution drove the organism 
towards optimal behavior, optimization principles allow 
us to both tune the otherwise unknown parameters to 
achieve the maximum, and also to compare the wild type 
and optimal performances. 

\Dekel and A~!on] (|2005ft have performed the 
cost/benefit analysis of expressing lac operon in 
bacteria. On one hand lac genes allow Escherichia coli 
to digest lactose, but on the other there is the incurred 
metabolic cost to the cell for expressing them. That the 
cost is not negligible to the bacterium is demonstrated 
best by the fact that it shuts off the operon if no lactose 
is present in the environment. The cost terms are 
measured by inducing the lac operon with changeable 
amount of IPTG that provides no energy in return; the 
benefit is measured by fully inducing lac with IPTG 
and supplying variable amounts of lactose; both cost 
and benefit are in turn expressed as the change in the 
growth rate compared to the wild-type grown at fixed 
conditions. Optimal levels of lac expression were then 
predicted as a function of lactose concentration and 
bacteria were evolved for several hundred generations to 
verify that evolved organisms lie close to the predicted 
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optimum. 

Zasl aver et al have considered a cascade of 

amino-acid biosynthesis reactions in Escherichia coli, cat- 
alyzed by their corresponding enzymes. They have then 
optimized the parameters of the model that describes the 
regulation of enzyme gene expression, such that the to- 
tal metabolic cost for enzyme production was balanced 
against the benefit of achieving a desired metabolic flux 
through the biosynthesis pathway. The resulting opti- 
mal on-times and promoter activities for the enzymes 
were compared to the measured activities of amino-acid 
biosynthesis promoters exposed to different amino-acids 
in the medium. The authors conclude that the bacterium 
implements a 'just-in-time' transcription program, with 
enzymes catalyzing initial steps in the pathway being 
produced from strong and low-latency promoters. 

In signal transduction networks the definition of an 
objective function to be maximized is somewhat more 
tricky. The ability of the cell to sense its environment and 
make decisions, for instance about which genes to up- or 
down-regulate, is limited by several factors: scarcity of 
signals coming from the environment, perhaps because 
of the limited time that can be dedicated to data col- 
lection; noise inherent in the signaling network that de- 
grades the quality of the detected signal; (sub-)optimality 
of the decision strategy; and noise in the effector systems 
at the output. A first idea would be to postulate that 
networks are designed to lower the noise, and intuitively 
the ubiquity of mechanisms such as nega tive feedback 
\Becskei and Serrand l2000t I Goulianl 12004) is consistent 
with such an objective. There are various definitions for 
noise, however, which in addition are generally a function 
of the input, raising serious issues about how to formulate 
a principled optimization criterion. 

When we think about energy flow in biological sys- 
tems, there is no doubt that our thinking must at least 
be consistent with thermodynamics. More strongly, ther- 
modynamics provides us with notions of efficiency that 
place the performance of biological systems on an ab- 
solute scale, and in many cases this performance really 
is quite impressive. In contrast, most discussions of in- 
formation in biological systems leave "information" as a 
colloquial term, making no reference to the formal ap- 
paratus of information theory as devel oped by Shanno n 
and others more than fifty years ago ( Shannon] , I1948T) . 
Although many aspects of information theory that are 
especially important for modern technology (e.g., sophis- 
ticated error-correcting codes) have no obvious connec- 
tion to biology, there is something at the core of infor- 
mation theory that is vital: Shannon proved that if we 
want to quantify the intuitive concept that "a; provides 
information about y," then there is only one way to do 
this that is guaranteed to work under all conditions and 
to obey simple intuitive criteria such as the additivity of 
independent information. This unique measure of "in- 
formation" is Shannon's mutual information. Further, 
there are theorems in information theory which, in par- 
allel to results in thermodynamics, provide us with limits 



to what is possible and with notions of efficiency. 

There is a long history of using information theoretic 
ideas to analyze the flow of information in the nervous 
system, including the idea that aspects of the brain's 
coding strategies might be chosen to optimize the effi- 
ciency of coding, and these theoretical ideas have led di- 
rectly to interesting experiments. The use of information 
to think about cellular signaling and its possible opti- 



mization is more recent ttTkacik et al. , 2007a; Ziv et all 
2006). An important aspect of optimizing information 
flow is that the input/output relations of signaling de- 
vices must be matched to the distribution of inputs, and 
recent measurements on the contro l of hunchbac k by Bi - 
coid in the early fruit fly embryo \Greaor et all , l2007al ) 
seem remarkably consistent with the (p arameter free) 
predic tions from these matching relations \Tkacik et all 
l2007bh . 

In the context of neuroscience there is a long tradi- 
tion of forcing the complex dynamics of signal process- 
ing into a setting where the subject needs to decide be- 
tween a small set of alternatives; in this limit there is a 
well developed theory of optimal Bayesian decision mak- 
ing, which uses prior knowledge of the possible signals 
to help overcome n oise intrinsic to the signaling system; 
\Libbv et all (l2007h have recently applied this approach 
to the lac operon in Escherichia coli. The regulatory 
element is viewed as an inference module that has to 'de- 
cide,' by choosing its induction level, if the environmental 
lactose concentration is high or low. If the bacterium de- 
tects a momentarily high sugar concentration, it has to 
discriminate between two situations: either the environ- 
ment really is at low overall concentration but there has 
been a large fluctuation; or the environment has switched 
to a high concentration mode. The authors examine how 
plausible regulatory element architectures (e.g. activa- 
tor vs repressor, cooperative binding etc.) yield differ- 
ent discrimination performance. Intrinsic noise in the lac 
system can additionally complicate such decision mak- 
ing, but can be included into the theoretical Bayesian 
framework. 

The question of whether biological systems are optimal 
in any precise mathematical sense is likely to remain con- 
troversial for some time. Currently opinions are stronger 
than the data, with some investigators using 'optimized' 
rather loosely and others convinced that what we see to- 
day is only a historical accident, not organizable around 
such lofty principles. We emphasize, however, that at- 
tempts to formulate optimization principles require us 
to articulate clearly what we mean by "function" in each 
context, and this is an important exercise. Exploration of 
optimization principles also exposes new questions, such 
as the nature of the distribution of inputs to signaling 
systems, that one might not have thought to ask other- 
wise. Many of these questions remain as challenges for a 
new generation of experiments. 
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F. Evolvability and designability 



\Kirschner and Gerhard ( 19981 ) define evolvability as 
an organism's capacity to generate heritable phenotypic 
variation. This capacity may have two components: first, 
to reduce the lethality of mutations, and second, to re- 
duce the number of mutations needed to produce phe- 
notypically novel traits. The systematic study of evolv- 
ability is hard because the genotype-to-phenotype map is 
highly non-trivial, but there have been some qualitative 
observations relevant to biological networks. Emergence 
of weak linkage of processes, such as the co-dependence 
of transcription factors and their DNA binding sites 
in metazoan transcriptional regulation, is one example. 
Metazoan regulation seems to depend on combinatorial 
control by many transcription factors with weak DNA- 
binding specificities and the corresponding binding sites 
(called cis-regulatory modules) can be dispersed and ex- 
tended on the DNA. This is in stark contrast to the strong 
linkage between the factors and the DNA in prokaryotic 
regulation or in metabolism, energy transfer or macro- 
molecular assembly, where steric and complementarity 
requirements for interacting molecules are high. In pro- 
tein signaling networks, strongly conserved but flexible 
proteins, like calmodulin, can bind weakly to many other 
proteins, with small mutations in their sequence proba- 
bly affecting such binding and making the establishment 
of new regulatory links possible and perhaps easy. 

Some of the most detailed attempts to follow the evo- 
lution of n etwork function have been b y Francois and 
coworkers ( Francois and HakirrA . |2004 \Francois et all 



l2007t ) . In their initial work they showed how simple func- 
tional circuits, performing logical operations or imple- 
menting bistable or oscillatory behavior, can be reliably 
created by a mutational process with selection by an ap- 
propriate fitness function. More recently they have con- 
sidered fitness functions which favor spatial structure in 
patterns of gene expression, and shown how the networks 
that emerge from dynamics in this fitness landscape reca- 
pitulate the outlines of the segmentation networks known 
to be operating during embryonic development. 

Instead of asking if there exists a network of nodes 
such that they perform a given computation, and if it can 
be found by mutation and selection as in the examples 
above, one can ask how many network topologies per- 
form a given computation. In other words, one is asking 
whether there is only one (fine tuned?) or many topolo- 
gies or solutions to a given problem. The question of how 
many network topologies, proxies for different genotypes, 
produce the same dynamics, a proxy for phenotype, is a 
question of designability, a concept originally proposed 
to study the properties of amino-acid sequences compris- 
ing functional protein s, but applicable also to bio logical 
regulatory networks ( Nochomovitz and Li\ , 120061 ). The 
authors examine three- and four-node binary networks 
with threshold updating rule and show that all networks 
with the shared phenotype have a common 'core' set of 
connections, but can differ in the variable part, similar 



to protein folding where the essential set of residues is 
necessary for the fold, with numerous variations in the 
nonessential part. 



V. FUTURE PROSPECTS 

The study of biological networks is at an early stage, 
both on the theoretical as well as on the experimental 
side. Although high-throughput experiments are gener- 
ating large datasets, these can suffer from serious biases, 
lack of temporal or spatial detail, and limited access to 
the component parts of the interacting system. On a the- 
oretical front, general analytical insights that would link 
dynamics with network topology are few, although for 
specific systems with known topology computer simula- 
tion can be of great assistance. There can be confusion 
about which aspects of the dynamical model have bio- 
logical significance and interpretation, and which aspects 
are just 'temporary variables' and the 'envelope' of the 
proverbial back-of-the-envelope calculations that cells use 
to perform their biological computations on; which parts 
of the trajectory are functionally constrained and which 
ones could fluctuate considerably with no ill-effects; how 
much noise is tolerable in the nodes of the network and 
what is its correlation structure; or how the unobserved, 
or 'hidden', nodes (or their modification/activity states) 
influence the network dynamics. 

Despite these caveats, cellular networks have some ad- 
vantages over biological systems of comparable complex- 
ity, such as neural networks. Due to technological de- 
velopments, we are considerably closer to the complete 
census of the interacting molecules in a cell than we are 
generally to the picture of connectivity of the neural tis- 
sue. Components of the regulatory networks are simpler 
than neurons, which are capable of a range of compli- 
cated behaviors on different timcscales. Modules and 
pathways often comprise smaller number of interacting 
elements than in neural networks, making it possible to 
design small but interesting synthetic circuits. Last but 
not least, sequence and homology can provide strong in- 
sights or be powerful tools for network inference in their 
own right. 

Those of us who come from the traditionally quanti- 
tative sciences, such as physics, were raised with exper- 
iments in which crucial elements are isolated and con- 
trolled. In biological systems, attempts at such isolation 
may break the regulatory mechanisms that are essential 
for normal operation of the system, leaving us with a 
system which is fact more variable and less controlled 
than we would have if we faced the full complexity of 
the organism. It is only recently that we have seen the 
development of experimental techniques that allow fully 
quantitative, real time measurements of the molecular 
events inside individual cells, and the theoretical frame- 
work into which such measurements will be fit still is 
being constructed. The range of theoretical approaches 
being explored is diverse, and it behooves us to search 
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for those approaches which have the chance to organize 
our understanding of many different systems rather than 
being satisfied with models of particular systems. Again, 
there is a balance between the search for generality and 
the need to connect with experiments on specific net- 
works. We have tried to give some examples of all these 
developments, hopefully conveying the correct combina- 
tion of enthusiasm and skepticism. 
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